Sub-lexical Modelling Using a Finite State Transducer Framework1

نویسندگان

  • Xiaolong Mou
  • Victor Zue
چکیده

The finite state transducer (FST) approach [1] has been widely used recently as an effective and flexible framework for speech systems. In this framework, a speech recognizer is represented as the composition of a series of FSTs combining various knowledge sources across sub-lexical and high-level linguistic layers. In this paper, we use this FST framework to explore some sublexical modelling approaches, and propose a hybrid model that combines an ANGIE [2] morpho-phonemic model with a lexiconbased phoneme network model. These sub-lexical models are converted to FST representations and can be conveniently composed to build the recognizer. Our preliminary perplexity experiments show that the proposed hybrid model has the advantage of imposing strong constraints to the in-vocabulary words as well as providing detailed sub-lexical syllabification and morphology analysis of the out-of-vocabulary (OOV) words. Thus it has the potential of offering good performance and can better handle the OOV problem in speech recognition.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Context-dependent probabilistic hierarchical sublexical modelling using finite state transducers

This paper describes a unified architecture for integrating sub-lexical models with speech recognition, and a layered framework for context-dependent probabilistic hierarchical sublexical modelling. Previous work [1, 2, 3] has demonstrated the effectiveness of sub-lexical modelling using a core context-free grammar (CFG) augmented with context-dependent probabilistic models. Our major motivatio...

متن کامل

Sub-lexical modelling using a finite state transducer framework

The finite state transducer (FST) approach [1] has been widely used recently as an effective and flexible framework for speech systems. In this framework, a speech recognizer is represented as the composition of a series of FSTs combining various knowledge sources across sub-lexical and high-level linguistic layers. In this paper, we use this FST framework to explore some sublexical modelling a...

متن کامل

Klex: A Finite-State Transducer Lexicon of Korean

This paper describes the implementation and system details of Klex, a finite-state transducer lexicon for the Korean language, developed using XRCE’s Xerox Finite State Tool (XFST). Klex is essentially a transducer network representing the lexicon of the Korean language with the lexical string on the upper side and the inflected surface string on the lower side. Two major applications for Klex ...

متن کامل

A Non-deterministic Tokeniser for Finite-State Parsing

This paper describes a non-deterministic tokeniser implemented and used for the development of a French finite-state grammar. The tokeniser includes a finite-state automaton for simple tokens and a lexical transducer that encodes a wide variety of multiword expressions, associated with multiple lexical descriptions when required.

متن کامل

Using Genericity To Create Cutomizable Finite-State Tools

In this article we present the realization of a generic finite-state system. The system has been used to create concrete lexical tools for word form analysis, word form generation, creation and derivation history, and spenchecking. It will also be used to create a finite-state transducer for the recognition of phrases. Producing a finitestate component with the generic system requires little e~...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001